Controlling client access to networked data based on content subject matter categorization

ABSTRACT

An access control technique to limit access to information content such as available on the Internet. The technique is implemented within a network device such as a proxy server, router, switch, firewall, bridge or other network gateway. The access control process analyzes data in each request from the clients and determines if the request should be forwarded for processing by a server to which it is destined. Access control may be determined by comparing client source information against a database of Uniform Resource Locators (URLs), IP addresses, or other resource identification data specifying the data requested by the client. The invention therefore provides access control not based only upon content, but rather, based primarily upon the identity of the computers or users making the requests. The technique further avoids the problems of the prior art which categories or filters the content of only web pages based solely upon objectionable words. This is because a category database is used by the network device to control access and is created via a process involving human editors who assist in the creation and maintenance of the category database.

RELATED APPLICATION

This application is a reissue of application Ser. No. 09/052,236, filedon Mar. 31, 1998, now U.S. Pat. No. 6,233,618 B1.

BACKGROUND OF THE INVENTION

Computer networks, including private intranets and the publiclyaccessible Internet, have grown dramatically in recent years, to thepoint where millions of people all over the world use them on a dailybasis. The surge in the popularity of computer network use is due inlarge part to the vast amounts of data and information that is readilyavailable to people at a relatively small cost.

As an example, a computer network application that uses a suite ofprotocols known as the World Wide Web, or simply “the web”, permitscomputer users connected to the Internet to “browse” “web pages”. Tobrowse or “surf” the web, a person operates a client computer thatexecutes an application program called a “web browser”. The browserallows the user to submit requests for “web pages”, which are data filesstored at remote server computers called “web servers”. The browser mayalso allow access to other protocols and file types beside web pages.The web servers return the requested pages and/or data to the browserfor presentation to the user on the client computer. It is now commonfor web pages to contain many types of multimedia data including text,sound, graphics, still images and full motion video.

Like many other applications that use computer networks, the web usesvarious protocols to provide fast and efficient data communication. Theprocess of requesting, sending and receiving web pages and associateddata (i.e., surfing the web) over the Internet is handled primarily by acommunication protocol known as the Hyper-Text Transfer Protocol (HTTP).However, web browsers and other networking applications can also usemany other protocols such as the File Transfer Protocol (FTP), theTelnet protocol, Network News Transfer Protocol (NNTP), Wide AreaInformation Services (WAIS), the Gopher protocol, Internet GroupManagement Protocol (IGMP) for use in Multicasting, and so forth.Typically, these protocols use the data communication facilitiesprovided by a standardized network layer protocol known as theTransmission Control Protocol/Internet Protocol (TCP/IP) to perform thedata transactions described above.

Unfortunately, none of the aforementioned applications, protocols, norTCP/IP itself provides any built-in control mechanisms for restrictingaccess to web servers, pages of data, files or other information whichthe protocols can obtain and provide from servers. Restricted access toservers or data, for example, on the world wide web, may be useful inthe home to deny access to objectionable web page material requested bychildren. A similar need is increasingly felt by information technologyprofessionals in the corporate environment. Within many companies,reliable and ubiquitous access to computer networks is now a requirementof doing business. However, management increasingly feels the need tocontrol Internet access, not only to prevent employees from displayingobjectionable material within the workplace, but also to place limits,where appropriate, upon who can access certain information, such as webpage content for example, and when this access should be granted. Thereis increasing concern within many companies, for example, that withoutsome type of control on Internet access, certain workers will spend allday reading web pages devoted to news, sports, hobbies, and the like, orwill download entertainment related software, for example via FTP,rather than access the web pages or data files which assist them indoing their job.

Currently available access control mechanisms for networked data aretypically provided by either the server software, such as web ordatabase server applications, or the client browser or client terminalsoftware or a combination of both.

Various systems have been developed in an attempt to control access tonetworked data files in some way. For instance, U.S. Pat. No. 5,708,780discloses a system for controlling access to data stored on a server. Inthat system, requests for protected data received at the server mustinclude a special session identification (SID) appended within therequest, which the server uses to authenticate the client making therequest. If the SID is not present, the server requires an authorizationcheck on the requesting client by forwarding the original request to aspecial authorization server. The authorization server then interrogatesthe client that made the request in order to establish an SID for thisclient. The SID is then sent to the client, and the client can thenre-request the protected data using the new SID. In this system, accesscontrol is performed by customization of both the client and the server,and requires a separate authentication server.

Other schemes have been developed which place access controlresponsibility squarely within the client. Typically, these systems usewhat is known as data-blocking or web-blocking software. This softwaregets installed onto the client computer and controls the ability of theclient browser software to receive data from certain restricted servers.As an example, for restricting access to web pages, client computers caninstall web-blocking software called Surf-Watch from SurfWatch, Inc, adivision of Spyglass Software, Inc. Surf-Watch examines incoming webpage data against a restricted content database. When a web page arrivesat the client containing, for example, text data including obscenitiesthat are listed in the restricted content database, the Surf-Watchprogram detects these words and disables the ability of the browser todisplay the page and informs the user that the page is restricted. Thisprocedure is generally referred to as content filtering, since theactual content of the page or data itself is used to make access controldecisions.

The person who administers such software (typically a parent orinformation technology professional) is responsible for selecting whichtopics or words of content are to be filtered. For example, Surf-Watchallows the installer to select topics related to sexual material,violence, gambling, and drugs or alcohol. These topics definevocabularies of words that will be used to define the scope of therestricted content database. Any page that is received and that containsa word defined within these categories will not be displayed to theuser.

SUMMARY OF THE INVENTION

Prior art systems used for limiting access to data on the networkedcomputers, such as those used for the world wide web, suffer certaindrawbacks. For instance, in systems that place access control at theserver, it is up to the administrator of the server to decide who shouldand should not have access to the data being served. Systems usingauthentication servers also require each client to have knowledge of theaccess control system in order to correctly append the SID to eachrequest. The separate authentication communication between the server,the authentication server and the client creates additional networktraffic—this in turn means that access times are slowed considerably,since they must first be processed by the remote authentication server.

In systems that place access control at the client, it is up to theadministrator of each client computer (i.e. the parent or informationtechnology professional) to determine how the access control software isinstalled and configured on the client computer. Since client browsingand access control software is typically installed on a personalcomputer, easy access to the operating system and software stored on thecomputer disk make it possible for the restricted users (i.e., childrenor employees) to de-configure or un-install the blocking software,unbeknowst to the administrator. In environments such as schools andcorporations, maintaining each client installation of, for example,web-blocking software as a separate system thus becomes a quitecumbersome administrative task.

Furthermore, content filtering based solely upon supposedlyobjectionable words is not foolproof. For example, a word such as“breast” might be considered to be objectionable, and the blockingsoftware might typically be set to block access to any web page or datafile requested that contains that word. However, a web page or FTP site,for example, as published by a respected government research center, mayin and of itself not be objectionable simply because it contains pagesor files containing that word. Indeed, such a page or file may be highlyrelevant and even desirable for access by, for example, a high schoolstudent performing research for a science project devoted to cancerrisks in adult women.

In other instances, there may not be keywords associated withobjectionable content. For example, a web pages page may simply consistof one or more objectionable pictures without embedded keywords.Similarly, an FTP site may simply consist of a directory with one ormore graphics files which are objectionable. Content filtering based onkeywords does not help with either situation.

The present invention overcomes these and other problems of prior artnetwork data access control systems. This invention exists typically asa software program installed on a network device interconnected betweentypically a first and second computer network. The network device may,for example, be a proxy server, bridge, router, or firewall. The firstnetwork may be a local area network (LAN) located, for example, at anInternet service provider (ISP) or within a corporate or other privateintranet. The second network may be the Internet or other large widearea network.

The network device is responsible for controlling access by clientcomputers to data available from server computers, when those requestsare made via any one or more of a variety of protocols such as HTTP,FTP, Gopher, Telnet, WAIS, NNTP, and so forth. The invention isextendable to provide access control for other types of data accessprotocols used to transfer data between computers as well, such asprotocols that will arrive in the future to perform data exchange ordata transactions. The network device includes, typically, a dataprocessor providing a first interface for receiving requests fromclients, such as may be connected to the first network, for data storedon servers on the second network.

The network device also includes an access control process coupled tothe first interface. The access control process analyzes data in eachrequest from the clients and determines if the request should beforwarded to the second network for processing by a server to which itis destined. The determination to forward or not is made by crossreferencing information in the request with access control data in atleast one access control database, that may be, for example, storedlocally within the network device, but that can be provided from aremote source, such as a subscription service providing periodic accesscontrol database updates. By automating the access control databaseupdate process, the invention does not have to burden its owners orusers with constant maintenance.

The network device also includes a second interface coupled between thefirst and second network and the access control process. The secondinterface forwards the requests from the first interface to the serverson the second network if the access control process determines therequest should be forwarded to the second network for processing by aserver to which it is destined. The information in a request providesthe required information, including address data indicating a source ofthe request and also may include either a Uniform Resource Locator (URL)or an address of the data specifying a specific page of data, a “webpage”, a file, or a specific service to be supplied by a remote serverto which that request is destined. That is, no matter what theapplication is, such as world wide web access, FTP access, Telnetaccess, and so forth, the information in the request identifies thesource (i.e., who or which client is making the request) and identifieswhat server or remote computer will supply data in response to therequest. This information is matched to the access control databases ofthe invention before being allowed to be forwarded from the secondinterface.

In this manner, the invention provides access control not primarily uponcontent, and not at either the server or the client, but rather, basedupon the requests made by whom, at what times, and according todifferent categories of subject matter, as will be explained in detailbelow.

The invention further avoids the problems of the prior art whichcategorizes or filters the content of web pages based solely uponobjectionable words. For example, the category database used by thenetwork device to control access is preferably created via a processinvolving human editors who assist in the creation and maintenance ofthe category database. The editors review the URLs or addresses of newuncategorized web pages, data files, or server machines, and evaluatethe content of the web site and web pages or data files or serverinformation referenced by the URL or address, placing that URL oraddress into one or more of the categories.

The invention also provides for automatic updating of the various accesscontrol databases, for example, over the network, so that the accesscontrol mechanism is always using the most recently discovered networkdata which is determined to be restricted in content. Automatic updatesmay be provided, for example, using SNMP managed network devices thatcan synchronize local access control database(s) with a master databasefor example.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, features and advantages of theinvention will be apparent from the following more particulardescription of preferred embodiments of the invention, as illustrated inthe accompanying drawings in which like reference characters refer tothe same parts throughout the different views. The drawings are notnecessarily to scale, emphasis instead being placed upon illustratingthe principles of the invention.

FIG. 1 illustrates an example networked computer environment in whichthe present invention may be used.

FIG. 2 shows a flow chart of the general processing steps forconfiguring the databases used by the invention.

FIG. 3 illustrates a simplified example of the contents of a packet asused in this invention.

FIG. 4 shows a flow chart of the general processing steps performed by anetwork device according to this invention.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

FIG. 1 illustrates an example networked computer environment 30 in whichthe present invention may be implemented. The networked computerenvironment 30 includes a first or Local Area Network (LAN) 40 composedof client computer hosts (“clients”) 50 through 53, a second or WideArea Network (WAN) 45 including server computer hosts (“servers”) 54through 56, and a network device 100 having access control databases 230203, 204 and 208. The network device 100, is connected to permit datacommunication between the Local Area Network 40 and Wide Area Network45, and is in particular configured according to the present inventionto provide an access control mechanism for all data information requestsmade from clients to servers, such as, for example, web page, newsserver, or FTP data or application download requests.

While the invention is applicable to many types of data transferoperations made from client to server computers, the preferredembodiment described herein relates primarily to world wide web pageaccess. However, it is to be understood that the invention is applicableto access control to other types of data provided by other protocolssuch as Gopher data provided by Gopher servers, FTP servers, Usenet Newsservers, Multicast Backbone (MBONE) Servers, and so forth. The inventionmay also be used to restrict access to actual application softwareprovided by servers, such as, for example, Java JAVA™ applets servedfrom dedicated application servers. (JAVA™ is a trademark of SunMicrosystems™, Inc., Santa Clara, Calif., U.S.A.).

In FIG. 1, the Local Area Network (LAN) 40 inter-networks the clients 50through 53, and the Wide Area Network (WAN) 45 inter-networks theservers 54 through 56. WAN 45 may be, for example, the Internet, and LAN40 may be, for example, any type of computer network such one used in acorporate, institutional, Internet service provider (ISP) or similarsetting in which multiple computers access each other and the WAN 45.The LAN 40 and/or WAN 45 may be implemented using Ethernet, ATM, FDDI,SONET, token-ring, wireless or other types or combinations of physicalnetwork layer topologies.

The clients and servers 50 through 56 may be workstations, personalcomputers, or other data processing devices linked via the LAN and WANcommunication mediums which operate a protocol that supports high-speeddata communications, such as, for example, the Transmission ControlProtocol/Internet Protocol (TCP/IP).

The LAN 40 is coupled via a network link 41 to the network device 100,which is in turn coupled to the WAN 45 via network link 46. The networkdevice 100 may be, for example, a router, proxy server, firewall,bridge, hub, switch, or other data transfer, switching or network devicethat allows data, usually in the form of frames, packets or datagrams,to be transferred back and forth between the LAN 40 and the WAN 45. Inthe context of this invention, network device 100 is usually owned andadministered by the same organization that owns and administers the LAN40. The network device 100 serves as the “gateway” through which alldata communications must pass between the two networks 40 and 45. Such agateway may be located at an Internet service provider (ISP) wherein theclients are connected to the LAN via dial-up modems, or within acorporate or other institutional environment, between the LAN and anInternet connection. While not shown, it is noted that the invention mayemploy more than one network device 100 to provide access control toclients on LAN 40 between many different WAN's or to the same WAN 45.

As a “gateway”, the network device 100 according to this invention isconfigured also to monitor the data communications that pass betweenclients connected to the LAN 40 and servers connected to the WAN 45. Thenetwork device 100 can, for example, detect requests for web pages,files or other data from any of clients 50 through 53 to servers 54through 56. The network device 100 then either allows or denies thedetected web page or information requests based on an examination of thecontent of the specific requests in comparison with access control datastored in databases 203, 204 and 208.

By locating the access control decisions in neither the server norclient computers 50-56, but rather, within network device 100, web pageand data access for all clients 50 through 53 may be controlled as agroup, without any separate client or server configuration required fromthe administrator who operates the network device 100. Also, since afirewall, bridge, router or gateway to the Internet, for example, istypically isolated from physical and logic access by users, a trustedsystems administrator can be responsible for administering an accesscontrol policy which is more difficult to circumvent than when left upto the users of the clients or servers.

In order for network device 100 to be able to make access controldecisions regarding requests for web pages, files or other informationprovided by servers, it must be configured with access control data suchas stored in databases 203, 204 and 208. The access control data defineswhich clients can access which web pages or data from remote servers atwhat times and under what conditions. Users of the client computers inthis invention are assigned to various groups, which may, for example,be based on that persons responsibilities within the organization thatis using the system of this invention. If a user is in a particulargroup, the invention can further limit access control to, for example,web pages, data, programs, files or documents for that group at certaintimes, while not limiting access at other times. Still further, thisinvention provides the ability to limit access control to web pages ordata provided by servers that fall into many different categories. Thatis, access control is provided based on the categories or types of datato be accessed, on groups of users, and on the time during which accessis requested.

As an example, in a high school environment having a LAN within theschool, the network device of the invention can have access controldatabases configured to restrict access to a remote network server thatserves (i.e., allows remote playing of) Java JAVA™ applet chess games.The network device which allow access to the server only by the chessclub members of the school and only if they are using the chess clubcomputers in the chess club meeting room and only during chess clubmeeting hours. Other users of the schools LAN computer network usingcomputers located elsewhere in the school at different times (or evenduring chess club hours) can be restricted from accessing this serverover the Internet using the invention.

An explanation of the databases 203, 204 and 208 will clarify the natureof the access control capabilities of the invention.

Database 203 is called the group/source database. A simple example ofthe data in this database is shown in Table 1.

TABLE 1 Group/Source Data GROUP SOURCE LIBRARY CLIENT 50 CLIENT 51FACULTY CLIENT 52 PRINCIPAL CLIENT 53In FIG. 1, each client computer 50 through 53 may be associated with oneor more groups used for access control in this invention. Suppose, forexample, that LAN 40 is used within an elementary school system and thegroup/source database 203 in Table 1 is configured for such anenvironment. Client computers 50 and 51 may be located in the library,while client computer 52 may be located in the faculty lounge, andclient computer 53 may be in the principal's office. Accordingly, inthis example, the group/source database 203 may list three groups incolumn 1 of Table 1; library, faculty, principal. Each group will haveone or more associated client addresses (i.e., sources) and/or usernamesidentifying which users (via which client computers) are in whichgroups. Column 2 in Table 1 associates each source client computer to agroup.

In the example shown in Table 1, client computer numbers are used. In apreferred embodiment, the computer numbers used by the group/sourcedatabase 203 are preferably machine address (i.e., Internet Protocol(“IP”) or Media Access Control (“MAC”) addresses, as will be describedbelow) to identify sources, or sources may be broken down even furtherto the username level, such that no matter which client computer aspecific user logs in at, that user will always be associated with hisor her respective group. In such a case, groups would have sourcescontaining usernames, instead of hostnames, or sources may beusername/hostname pairs. As will be explained, the group/source database203 will be used to determine who is requesting the information over thenetwork, such as web page data for example, and what their level ofaccess is.

Table 2 below provides an example of the data contained in theGroup/Category database 204.

TABLE 2 Group/Category Data GROUP RESTRICTED CATEGORIES LIBRARY 1, 7, 9,11, 18, 19, 22, 24, 28 TIME: 1-4 pm FACULTY 1, 9, 18, 19, 24, TIME:8am-11:59 am, 1 pm-4 pm Monday-Friday PRINCIPAL 4, 13, 14, 16, 17, 20,21, 23, 25, 26, 27 TIME: 2-4 am, 6-11 pm

As shown in Table 2, data contained in the group/category database 204associates each group with the restricted categories for that group andother access attributes such as the time of day during which thosegroups are restricted. For instance, a user of a client computer who isin the faculty group will be restricted from viewing web pages that fallinto categories 1, 9, 18 and 24 from 8 am to 11:59 am (i.e., morningwork hours) and from 1 pm to 4 pm (i.e., afternoon working hours) duringevery Monday through Friday (i.e., workdays). The principal of theschool, however, is allowed to access all internet servers, web sites,and data at all hours except from 2 to 4 am and 6 to 11 pm. As will beexplained shortly, each category is associated with a specific topic,such as sex, violence, drugs, and so forth. In one embodiment of thisinvention, there are thirty different categories. Thus, if a user of aclient computer is excluded from certain categories, when they make arequest for a web page or a server location or a data file having anInternet access address that appears in one of those categories in thecategory/destination database 208 (to be explained), that user will bedenied access to that data, file, applet, web page, and so forth.

The data in databases 203 and 204 may be configured by the administratorof the system. The data may be stored in any form of database format,such as in a relational database format, for example. It is noted thatdatabases 203, 204 and 208 must be accessible to network device 100, butneed not be located within or directly attached to network device 100.For instance, a file server using the network file systems (NFS) can beused to provide network device 100 access to databases 203, 204 and 208,each though the disks storing the data are located elsewhere on LAW LAN40, for example. Alternatively, the databases 203, 204 and 208 canreside in the network device itself.

The third database used by network device 100 for access control is thecategory/restricted destination database 208. This database is a keyelement of the invention, and provides a list of the Uniform ResourceLocator (URL's) including URL segments, and IP addresses, for serverscontaining restricted files, applets, documents, web pages, news groups,Multicast sessions or other content, for each category. The size of thedatabase 208 can vary and may be very large in some instances. Anabbreviated example of the contents of the category/restricteddestination database is given in Table 3.

TABLE 2 Category/Destination Data URL IP CATEGORY URLS SEGMENTSADDRESSES 1. Alcohol alcohol.com, /www.drink.com/ 12.34.105.23www.drink.com, margarita 213.56.3.12 www.intoxicated. 224.0.0.0 com 2.Alternative /www.hermit.com /www.recluse. 201.2.123.67 Lifestyle /com/hate- 145.23.1.231 people . . . . . . . . . . . .

In Table 3, each category is listed as a number, along with its nameindicating the subject matter associated with that category. There areonly two categories shown in this example for ease of description. Thecategories are matched in Table 3, and in database 208, with the serveraddress including document locations (e.g., locations of web pages viaURLs) and IP address which are to be restricted for a group having thosecategories. For instance, category 1 is alcohol. In columns 2, 3 and 4of this category, URL's and segments of URL's and IP addresses arelisted which indicate which addresses of files, documents, web pages,web sites and other information on the network, Internet, or world wideweb that are restricted for access within that category. For instance,under the category alcohol, no access is allowed to the web site incolumn 2 listed as alcohol.com, and no access is allowed for requests tothe IP address 213.56.3.12, which may correspond, for example, to thehome page of a bar, brewery, or other drinking establishment.

As another example, in the IP Addresses Column in table Table 3, IPaddress 224.0.0.0 is listed, which corresponds to a special type of IPaddress reserved for Multicast Broadcast data streams. Thus, access toMulticast data streams accessed via user applications running on clients53 50 through 53 may be restricted as well, through the use of thisinvention. This example illustrates that the invention is applicable torestricting access to data other than just world wide web page or URLdata. Those skilled in the art will now readily understand that otheraddress mechanisms which may be similar in nature to URL or IP addressesmay be incorporated into the access control databases of this inventionto restrict access to the locations of data, documents, files or thelike over a computer network.

In this invention, the category database 208 is created separately forthe operation of the network device 100, for example, by a third partyother than the owner and administrator of the network device 100. Thatis, since the category database must contain, for example, all of theweb site URL's, home pages addresses, IP addresses, new groups, data andfile locations, and other information indicating destinations forrequests that are to be restricted, this information can become quitevoluminous, and in a preferred embodiment, is created as a single masterdatabase 208.

Access to the master category database 208 may be incorporated into thenetwork device 100 in various ways, each of which is within the scope ofthis invention. For example, as noted previously, the category database208 may be stored and updated in a database locally on a hard diskwithin the network device 100, using update disks periodically loadedonto the network device 100. Alternatively, the category database 208may be provided to the network device using a protocol, such as theSimple Network Management Protocol (SNMP), which may use an agentrunning locally on the network device 100 to control network deviceconfiguration and database content from a remote network managerstation, which can be controlled by a third party offering asubscription to periodic database updates. Thus, any organizationimplementing the present invention can merely receive a copy of thecategory/restricted destination database 208 for use with their systemwithout having to be concerned with the installation of the data.

Since the Internet topology, IP addresses, server location, and theWorld Wide Web are all constantly changing and URL's, web servers, newssites, Multicast channels, and so forth are all being added and removedfrom networks such as the Internet on a daily basis, using thisinvention, one organization can keep the master category database 208current and up to date, and each organization that uses the database 208in conjunction with their own network device 100 can subscribe to, forexample, a monthly update or subscription service. In this manner, usingSNMP or an automated download service, for example, the database 208 maybe distributed to the network devices 100 of all subscribingorganizations for use, and each organization need not worry aboutkeeping their category database 208 current with the current state ofthe world wide web. The entire update process may be done over eitherLAN 40 or WAN 45, without the need for sending physical disk mediathrough the mail or postal service.

FIG. 2 shows the processing steps involved according to this inventionto configure network device 100 with the access control database 208.Step 150 provides an automated network-walker whose function is tocontinually examine the world wide web, and any other accessiblenetworked data servers for new addresses, files, web sites, home pages,documents, Multicast channels, and so forth. The network-walker is anautomated knowledge robot software process which continually surfs theweb and examines Internet content providers to gather newly found URL'sand IP addresses of web servers or other content providing computers.

For purposes of this explanation, the term URL, for Uniform ResourceLocator, refers to the location of any type of content on a computernetwork, and not just to web pages or information obtained via HTTP.Thus, each time a new URL or address of a content server is obtained ordiscovered by the network-walker, step 151 checks to determine if thenew URL is contained in any one of three databases. The first databaseis a URL queue database 152 that stores the new URLs in incoming orderfor processing by subsequent steps. If the new URL in step 151 is not inthe URL queue database 152, an uncategorized URL database 153 is thenchecked. Database 153 holds URLS URLs that must be categorized, as willbe explained. If the new URL at step 151 is not in databases 152 or 153,the category/restricted destination database 208 is checked. If the newURL is in one of databases 153 or 208, the URL is discarded, in step159. If the new URL is in none of these databases 152, 153 or 208, step151 places the new URL into the URL queue database 152.

Step 154 gets the next URL from queue database 152 and determines thenetwork address (i.e., IP address) of the server (i.e. for example, oneof web server 54, 55 or 56 in FIG. 1) that provides the content of theURL, and determines any URL segments within this URL. A URL segment maybe a sub-page, for example, that may exist below a home web page. Forexample, if the URL is www.xxx.com, a segment of this URL may bewww.xxx.com/pornography/photos.

Alternatively, in another example, if the URL represents a news serverusing NNTP to propagate news groups over a network, the URL may includethe IP address of the news server and URL segments may representindividual news groups offered by that server. As another example, ifthe URL is the IP address representing a Multicast address of a channelof real time audio and/or video information, a URL segment may berepresented by Multicast addresses of sub-channels within the domain ofthe IP Multicast address. Thus, if the network-walker detects a newMulticast channels being broadcast on address 224.0.0.0, thenetwork-walker may log 224.0.0.1, 224.0.0.2, and so forth as Multicastsub-channels or URL segments in this invention within queue database152.

Step 154 also attempts to obtain a description of this URL by accessing,for example, the home page to which it a web-page URL refers to. Adescription of a home page, and hence its URL, may exist in theHypertext Markup Language (HTML) that is used to actually create andformat the data which comprises an actual web page. In an alternativeexample, in the case of the URL that is only an IP address or aMulticast address, other identification about the content serverprovider may be obtained, for example, by using the “whois” internetnetwork information service or another similar protocol-basedinformation service. “Whois” is a protocol that is used in conjunctionwith an IP address, by issuing, for example, the command “whois224.0.0.011” and awaiting a response. A Multicast server that isproperly configured typically returns an indication of who owns andadministers the server machine at the specific IP address that isproviding the content, as specified in the “whois” protocol, and alsoreturns information concerning the IP Multicast address content. Thisdescription and information received is obtained and stored by step 154.

In the www.xxx.com example, step 154 may obtain, for example, a page ormeta-description of the entire web site that may look something like“www.xxx.com is an adult oriented site supplying pornographic images toweb browsers.” In the Multicast example, whois may return “1244.0.0.0 isan internet Multicast channel served from a SUN SUN™ Workstation at XYZCorporation and is dedicated to providing real-time audio and videoinformation on religious activities.” (SUN™ is a trademark of SunMicrosystems™, Inc., Santa Clara, Calif., U.S.A.). This description issaved in step 154, since it may be relevant for determining the categoryof the web site or content server, which in the first case is sexualmaterial, and in the later case is religious material.

Next, in step 155, the new URL and its associated data gathered in step154 are placed into the uncategorized database 153 until the server,data stream or web site for this new URL can be examined for content bya person in order to precisely associate one or more categories withthis URL.

In step 156, a person who assists in the creation and maintenance of thecategory/restricted destination database 208 reviews the next URL at thetop of the list from the list or URL's in the uncategorized URL database153. In step 156, the person may use, for example, a web browser tovisit the actual web site specified by the URL, or may using a Multicastreceiver application or a news reader application to view the dataprovided by the server specified in the current URL. While visiting theweb page or examining or listening to or viewing the data provided fromthe server listed in the URL and that URL's associated URL segments, theperson, in step 157, makes a determination about the content of theserver (e.g., a web site) referenced by the URL and places that URL intoat least one, and typically more than one, of the categories in thecategory/restricted destination database 208. Using the previousexamples, the www.xxx.com web-site URL would be placed into thepornography or sexual material category and the religious Multicastchannel would be placed into the religious category. Accordingly, atstep 157, that server or web site or content provider and its associatedpages, data streams, files, news groups, and so forth are now in thedatabase 208 which can be used for access control. Finally, in step 158,the URL associated with the data is removed from the uncategorizeddatabase 153.

While not shown in FIG. 2, processing continually repeats itself, andmany concurrent iterations of the processing steps 150 through 158 maybe taking place at one time. Accordingly, there may be a number ofdifferent people in step 156 that have the job of reviewing andcategorizing content provided by servers, web pages and web sites, IPaddresses, Multicast addresses, news groups, public mail servers, etc.Moreover, the network-walker in step 150 is continuously obtaining newinformation about current content providers on the computer network,such as the Internet. These tasks, and the processing of FIG. 2, aretypically performed by the service organization that provides thecategory database 208 to all of the subscribers who utilize this aspectof the present invention with their network device 100, in order to haveup to date access control provided to their LAN 40.

In this manner, by processing the steps of FIG. 2, a very thoroughcategory/restricted destination database 208 is created and maintained.The network-walker function in step 150 is constantly examining thenetwork (i.e., the Internet, World Wide Web, etc.) for the latest URLsthat come into existence, and they are then processed as describedabove.

It is to be understood that the processing steps in FIG. 2 are typicallynot be performed by the network device 100, though the administrator ofLAN 40, who may control network device 100, could, if he or she wantedto, perform the processing of FIG. 2 in order to add other URL's todatabase 208. However, in a preferred embodiment, network device 100merely obtains access to databases 203, 204 which are locally configuredduring the setup of each network device 100. Database 208 is accessedlocally, but is routinely update by downloading or automaticallytransferring (i.e., via an SNMP agent or FTP) the latest created versionfrom a centralized location such as a provider of a subscription serviceto the database 208. Once each of the databases 203, 204 are configuredand database 208 is downloaded and made available to the network device100 somewhere on LAN 40, the network device 100 can then operate toprovide complete access control of server, web pages, and other types ofcontent for users of the client computers 50-53 connected to LAN 40,according to the aforementioned aspects of the invention.

In operation of the access controlled network computer environment 30according to the access control aspect of the invention, one or moreclient computers 50 through 53 are configured with standard web browsingor content accessing application software (not shown) such as, forexample, the commonly known web browser produced by Netscape, Inc.entitled “Netscape Navigator” (TM) Netscape Navigator®, or, MicrosoftCorps. Microsoft® Corporation's browser software entitled MicrosoftInternet Explorer (TM) Microsoft® Internet Explorer®. (NetscapeNavigator® is a registered trademark of AOL® LLC, New York, N.Y.,U.S.A., Microsoft® and Internet Explorer® are registered trademarks ofMicrosoft® Corporation, Redmond, Wash., U.S.A.). Another example ofcontent accessing software is an Internet Radio program that joins aMulticast group in order to listen to real-time audio. The browser orcontent application software need not be modified or customized in anyway for this invention to work properly. The clients, browsers andcontent applications need not actually be part of the invention, butrather, benefit from the invention's access control capabilities. Thebrowsers or applications on each client computer 50 through 53 allowusers to request pages or data or other information from servercomputers 54 through 56 on the Internet, while still being subject toaccess control provided by the network device and its configuration anddatabases provided by the invention.

As an example, for client 52 to request a web page from server 55,client 52 uses the Hyper-Text Transfer Protocol, which operates inconjunction with TCP/IP, to produce a packet of data (not shown inFIG. 1) that gets sent from the requesting client 52 onto the LAN 40 tobe forwarded and received by server 55. In the invention, based on thecontents of the packet sent from client 52, a determination may be madein network device 100 as to whether or not the request should beforwarded to WAN 45 and thus to server 55. As another example, if aclient application desires to receive Multicast packets of Internetpacket radio broadcasts, client 52 uses the Internet Group MessagingProtocol (IGMP) to produces a packet requesting to join a specificMulticast group. The IGMP request must pass through network device 100in order to obtain Multicast Group access to a server supplying theMulticast data.

In order to explain how the network device 100 operates as an accesscontrol system for all data requests from client computers 50 through 53on LAN 40, a brief explanation of network packet communications andcontent is needed.

FIG. 3 shows a highly simplified example breakdown of the contents of adata packet 300 that carries a request for a web page from client 52 toa server 44. Access to a web page will be used in this description, butother content services using other protocols are applicable to thisinvention as well. Packet 300 contains fields 301 through 305. It is tobe understood that packet 300 is highly simplified and does not revealall of the fields or contents of packets typically used in datacommunications. Rather, the packet 300 illustrates only those fieldsneeded to understand the concepts of this invention.

Packet 300 includes a beginning field 301 recognizable by network device100 as the start of a packet, and an ending field 305 recognizable asthe end of the packet. The source address field 302 indicates the sourceof the data packet, which is the network address of the client computersending the request. Source address field 302 may contain, for example,IP and/or Media Access Control (MAC) addressing information. Thedestination address field 303 indicates the destination network addressof a remote server computer that is to receive packet 300, and may alsocontain IP and/or MAC layer addressing information. The data field 304is used to transport the data or payload of the packet from the browserapplication (i.e., Netscape) on the client 52 to the web server softwareoperating on the web server 55. In the example shown, the data field 304contains the request in the form of a full Uniform Resource Locator(URL) for a web page. A URL serves as the indicator of the request fromthe client for a specific web page stored one of the servers, and can bedetected by network device 100.

As noted previously, to perform access control, packet information iscompared against database information within network device 100. FIG. 4shows the processing steps performed by network device 100 to performaccess control according to this invention. Since network device 100serves as a gateway, router, proxy server or other data transfermechanism to the WAN 45 from the LAN 40, the network device 100 can alsomonitor the contents of outgoing packets traveling from LAN 40 to WAN 45for such data as HTTP level request messages for URLs, such as an HTTP“GET” message. As noted previously, other requests for others types ofnetwork content provided by servers, such as news group requests, IGMPMulticast group join requests, FTP file transfer requests, and so forthmay also be incorporated into the monitoring facilities of networkdevice 100 in this invention. During this monitoring process, in step200, the network device 100 receives and detects a packet containing, inthis example, an HTTP request in data field 304 of the packet. Thedetection can be done, for example, using an application programminginterface (API) that allows the network device 100 to screen anyselected packet field for information, such as addresses and data in alloutgoing packets. The network device 100 can, using an API provided, forexample, by proxy server software running on the network device 100,also detect IP port, TCP socket and/or session numbers which packets areassociated with as well. HTTP and most other network protocols typicallyassociate themselves with either a specific port, socket, IP address,session number, or other unique identifier within TCP/IP, which isanother way the network device 100 can detect the presence of a packetcontaining a request for a web page, data file, audio or video stream,news group, file transfer, and so forth.

In the web access example, once a web page request is detected in apacket, in step 201, the source address of the packet in field 302 isexamined. The source address may be an IP address, or a MAC address, oran address/username combination. Then, step 202 matches the sourceaddress and data with the group/source database 203 (i.e., Table 1) inorder to determine the group in Table 1 to which the packet containingthe HTTP request belongs. In other words, the packet came from one ofclients 50 through 53. Hence, step 202 matches packet information togroup information such as that shown in Table 1, in order to determinewhich client and/or user on LAN 40 is sending this particular web pagerequest packet and determine what group that machine or machine/usernamecombination is in within database 203.

Next, step 205 obtains the active categories for the group determined instep 202, by consulting the group/category database (i.e., Table 2).Thus, step 205 obtains a list of all of the categories which are to beconsulted to see what restriction are placed on the requested URL, IPaddress, or other content destination. That is, step 205 determines whatgroups can access what categories of content and when. Note that thecategories are referred to as active since they are only selected forchecking in step 205 if the current time of day listed for thosecategories is applicable at the current time, based on the currentsystem clock time in the network device 100. That is, step 205determines, based on the identification of the group of the person orclient requesting the page or data in step 202, which categories forthat group (i.e. the person requesting the page or data) are restrictedand at what times those categories for that person (i.e. that group) arerestricted.

Step 206 then obtains the actual URL and the destination IP or othertype of address from the data field 304 and the destination field 303,respectively, of the packet sent by the client. Step 208 207 thenmatches the IP address, the URL, or any segment of the URL against eachcategory obtained in step 205 in the category/restricted destinationdatabase 208. In step 206 then, each category specified as being activefor the group of the client requesting the web page or data is consultedto see if the requested page or data is listed in any of the URL or IPdata associated with that category.

In step 209, if either the IP address, the URL or any segment of the URLmatches to any restricted destination information (i.e., columns 2, 3 or4 of Table 3) for any of the categories obtained in step 205, then step210 is executed which denies access to the requested web page, data,service or content requested in the packet received rom the client atthe network device 100. In other words, step 210 does not forward thepacket on to the content server indicated in the destination field 303of the packet if the client in the specific group was requesting a pageor data or a service that existed in the category database 208 for oneof the categories that was active for that group. Quite simply, theclient was trying to access a restricted web site or URL or IP addressor service and step 209 detects this information in one of the activecategories in database 208 and step 209 can deny access.

In step 209 does detect an attempt at restricted access to a service,web site, data or other restricted content, step 214 is executed whichuses the source address in field 302 of the packet 300 to send a returnnotification of denial to the user at the client computer requesting therestricted data. Step 215 may also be executed which logs the illegalattempted request to a log file.

However, if step 209 determines that neither the IP address, the URL, orany URL segments matched any of the restricted data for any of theactive categories obtained in step 205, then step 211 allows the requestto be forwarded to the content server through network device 100. Inother words, the request was for legitimate non-restricted web pages,services, or data provided by a server on WAN 45. Once the request isreceived by the server to which it was destined, the server begins toreturn the requested data in the form of a web page, a file transfer, anews group, or other data.

Step 212 then begins to receive the web page or other content datapackets and step 213, which may be optional, can filter the incomingdata in the returned data packets for objectionable data, such asprofanity occurring in the text of web pages or news groups or otherobjectionable content as may be defined. That is, content filtering mayalso be incorporated into the invention as data is returned from theservers. This is beneficial and overcomes the problems of the prior artcontent filtering systems since in this invention, the content filteringcan be centralized at the network device 100, rather that administeringmany separate clients that each contain their own content filteringdatabase.

In this manner, the present invention provides a robust data accessfiltering system that provides access control based on users, categoriesand times of use and not purely on content of data being accessed. Thisis beneficial since content filtering alone often overlooksobjectionable material such as pornographic images, which contain nowords to content filter upon.

Moreover, the present invention is centralized to offer ease ofadministration and configuration and is very flexible since times of dayfor restricted access may also be specified, if desired. By having acategory database 208 that may be maintained offsite, by a third partyfor example, the invention allows the administrator to only have toworry about initial group/source configurations, and not worry aboutdatabase maintenance. New client computers that suddenly appear or getinstalled on LAN 40, that are not yet listed in the group/sourcedatabase, can be assigned a default group that has highly restrictedaccess associated to it in this invention. In this manner, the inventioncan handle future LAN 40 client expansion without having to furtherconfigure the new clients for access control.

While this invention has been particularly shown and described withreferences to preferred embodiments thereof, it will be understood bythose skilled in the art that various changes in form and details may bemade therein without departing from the spirit and scope of theinvention as defined by the appended claims. Those skilled in the artwill recognize or be able to ascertain using no more than routineexperimentation, many equivalents to the specific embodiments of theinvention described specifically herein. Such equivalents are intendedto be encompassed in the scope of the claims.

1. A hardware network device for controlling access by clients on aprivate network to a data file data files stored at servers in a publicnetwork, the hardware network device being interconnected between theprivate network and the public networks network, the hardware networkdevice comprising: a first interface receiving a request from a clientone of the clients on the private network to access a data file one ofthe data files stored at servers on in the public network; an accesscontrol processor coupled to the first interface, the access controlprocessor analyzing data in the request from the client one of theclients and determining if the request should be forwarded to the publicnetwork for processing by a server, of the servers in the publicnetwork, to which it the request is destined, the determination beingmade by cross referencing resource identifier information in the requestwith access control data in at least one access control database, theaccess control data containing categorized resource identifierinformation, the categorized resource identifier information specifyinga content subject matter category to which the data file one of the datafiles is assigned, and the categorized resource identifier informationassociated with each data file so categorized being assigned by priorlocating of each data file, storing data file information comprising auniform resource locator for each data file in a first database, readingthe data file information for each data file from the first database,human interpretation of the content in the each data file, and then, asa result of such human interpretation, determining a subject mattercategory to which the each data file is to be assigned, the data filestored at the servers on the public network and storing said data fileinformation and said subject matter category in the access controldatabase; a second interface coupled between the first interface and thepublic network and coupled to the access control processor, the secondinterface forwarding the requests request from the first interface tothe servers on in the public network if the access control processordetermines the request should be forwarded to the public network forprocessing by a the server to which it the request is destined; andmeans for permitting a network administrator of the public network tocontrol the operation of the hardware network device.
 2. The hardwarenetwork device of claim 1, wherein the access control database is storedlocally on a storage medium within the hardware network device.
 3. Thehardware network device of claim 2, wherein the access control databaseis downloaded by a download process on the hardware network device ontothe storage medium from an access control server.
 4. The hardwarenetwork device of claim 3, wherein the download process is automaticallyperformed at regular intervals.
 5. The hardware network device of claim3, wherein the download process is a subscription service to with whichthe hardware network device must be registered with so that the downloadprocess can be performed.
 6. The hardware network device of claim 1,wherein the access control database is stored remotely on at least oneaccess control server on the private network and access to the accesscontrol data in the access control database by the hardware networkdevice is performed by accessing the access control server.
 7. Thehardware network device of claim 1, wherein the access control databaseis stored remotely on at least one access control server on the publicnetwork and access to the access control data in the access controldatabase by the hardware network device is performed by accessing theaccess control server.
 8. The hardware network device of claim 6,wherein access to the access control data is a subscription service towith which the hardware network device must be registered with in orderto be allowed access to the access control data.
 9. The hardware networkdevice of claim 1, wherein: the request includes a source designationand the resource identifier information of the request specifies adestination of the request; the categorized resource identifierinformation in the access control data is categorized by associatingpredetermined destinations to specific categories of content; and theaccess control processor determines if the client one of the clientsmaking the request is associated with a category of content whichcontains a predetermined destination having a portion that is equal tothe destination specified in the resource identifier information of therequest.
 10. The hardware network device of claim 9, wherein the portionthat is equal to the destination specified in the resource identifierinformation of the request is a segment of the resource identifierinformation.
 11. The hardware network device of claim 9, wherein theresource identifier information of the request is an internet protocoladdress.
 12. The hardware network device of claim 9, wherein thecategorized resource identifier information in the access controldatabase is categorized by searching for uncategorized content providedby the servers located on in the public network and presenting theuncategorized content of the data files to humans for evaluation andcategorization to produce categorized content, the categorized contentbeing represented in the access control database by an identification ofa location of the categorized content on the servers of in the publicnetwork.
 13. The hardware network device of claim 12, wherein theuncategorized content provided by the servers on in the public networkis discovered by a network walker process which records new contentdestinations as they are discovered.
 14. The hardware network device ofclaim 1, wherein: the request includes a source designation and theresource identifier information of the request specifies a destinationof the request and the at least one access control database includes agroup-source database and the access control processor, in determiningif the request should be forwarded to the public network, matches thesource designation of the request to the group-source database todetermine the group of the client one of the clients making the request.15. The hardware network device of claim 14, wherein: the at least oneaccess control database further includes a group-category database andthe access control processor, in determining if the request should beforwarded to the public network, matches the group of the client one ofthe clients making the request to at least one category to determinewhich categories of content may be accessed by that group.
 16. Thehardware network device of claim 14, wherein: at least one accesscontrol database further includes a category-destination database andthe access control processor, in determining if the request should beforwarded to the public network, attempts to match the destinationspecified in the resource identifier information to at least oneresource identifier destination listed within categories in thecategory-destination database, and if a match is made, the accesscontrol processor denies access to the server to which the request isdestined.
 17. The hardware network device of claim 16, wherein theaccess control processor, in determining if the request should beforwarded to the public network, matches the group of the client one ofthe clients making the request to at least one category having anassociated block of allowed access times, to determine which categoriesof content may be accessed by that group and at which times.
 18. Amethod executing on a first client computer connected to a publicnetwork and on an access controller connected to a private network, themethod being for controlling access by clients of a the private networkto data files stored on servers connected in a the public network, themethod comprising the steps of: at a the first client computer connectedto the public network, using the first client computer to:searchingsearch for uncategorized data files being stored on serversconnected in the public network, the uncategorized data files beingavailable on demand; store data file information comprising at least auniform resource locator (URL) for each of the uncategorized data filesin at least one initial database; retrieve one or more selected datafiles from the initial database, at a time after the step of using thefirst client computer to store data file information in the at least oneinitial database; presentingpresent a view of each selected data file inhuman readable form on the first client computer connected to the publicnetwork; permittingpermit a human being to review the contents of eachselected data file so presented; determining aassociate, with eachselected data file, a determined content rating for each selected datafile in response to presenting the contents of the selected data file toa human being, the content rating being determined as a result of thehuman being assigning the selected data file to at least one contentsubject matter category; and storingstore a uniform resource locator(URL) of each selected data file together with the associated contentsubject matter categoriescategory in a category-destination database; atan access controller connected to the private network, using the accesscontroller to: downloadingdownload the category-destination database;receivingreceive requests from second client computers connected to theprivate network, the requests from the second client computersindicating requested data files stored on the servers ofconnected in thepublic network; analyzinganalyze the data in each request from a clientcomputer of the second client computers against the data from thecategory-destination database; and determiningdetermine whether toforward the request from the client computer of the second clientcomputers to a server of the servers connected in the public network forprocessing, the determination being made based upon the content ratingof the requested data file.
 19. The method of claim 18, wherein the stepof analyzing using the access controller to analyze the data in eachrequest further comprises the steps of using the access controller to:examiningexamine a source of the request against a group-source databaseto determine a group associated with the client making the request;examiningexamine the group associated with the client making the requestagainst a group-category database to determine the content ratings thatthe group may access; obtainingobtain URL information from the request;and determiningdetermine if the URL information has been assigned acontent rating that the group may access, and if so, allowingusing theaccess controller to allow the request, and if not, denyingusing theaccess controller to deny the request.
 20. The method of claim 18,further comprising the step of filtering using the access controller tofilter contents of return data sent from servers on connected in thepublic network in response to a request which is allowed.
 21. The methodof claim 18, wherein the URL information is an Internet Protocol (IP)address.
 22. The method of claim 18, wherein the URL information is aworld wide web page address.
 23. The method of claim 18, wherein the URLinformation is a portion of a world wide web page address.
 24. Themethod of claim 18, wherein the downloading using the access controllerto download is automatically performed at regular intervals.
 25. Themethod of claim 24, wherein the downloading using the access controllerto download is a subscription service to which the access controllermust be registered so that the downloading using the access controllerto download can be performed.
 26. The method of claim 18, wherein thestep of searching using the first client computer to search for newuncategorized data files on the public network is performed by a networkwalker process.
 27. The method of claim 19, wherein the group-categorydatabase includes at least one group that is associated with differentcontent ratings depending on the time of day of the request.
 28. Ahardware network device according to claim 1, the hardware networkdevice comprising one or more processors and one or more memoriesoperable to store program instructions executable by the one or moreprocessors to implement: the first interface, the access controlprocessor, the second interface, and the means for permitting thenetwork administrator of the private network to control the operation ofthe hardware network device.
 29. A hardware network device according toclaim 28, wherein: the request includes a source designation and theresource identifier information of the request specifies a destinationof the request and the at least one access control database includes agroup-source database and the access control processor, in determiningif the request should be forwarded to the public network, matches thesource designation of the request to the group-source database todetermine the group of the one of the clients making the request.
 30. Ahardware network device according to claim 29, wherein: the at least oneaccess control database further includes a group-category database andthe access control processor, in determining if the request should beforwarded to the public network, matches the group of the one of theclients making the request to at least one category to determine whichcategories of content may be accessed by that group.
 31. A hardwarenetwork device according to claim 28, wherein the access controldatabase is stored remotely on at least one access control server on thepublic network and access to the access control data in the accesscontrol database by the hardware network device is performed byaccessing the access control server.
 32. A hardware network deviceaccording to claim 1, the categorized resource identifier informationassociated with each data file so categorized being further assigned by,prior to storing the data file information comprising the uniformresource locator for each data file in the first database, determiningwhether the data file information comprising the uniform resourcelocator is already stored in either a queue database, the first databaseor the access control database and, if not, initially storing the datafile information comprising the uniform resource locator in the queuedatabase.
 33. A method according to claim 18, wherein the at least oneinitial database comprises (i) a queue database for holding the URLsassociated with the uncategorized data files and (ii) an uncategorizeddatabase, wherein the using the first client computer to retrieve one ormore selected data files retrieves such files from the uncategorizeddatabase, and wherein the using the first client computer to store datafile information further comprises: if the data file information locatedin the using the first client computer to search for uncategorized datafiles is not already stored in either the queue database, theuncategorized database, or the category-destination database, then usingthe first client computer to store the data file information in thequeue database.
 34. A method according to claim 33, further comprising:using the first client computer to obtain further information for thedata file information located in the using the first client computer tosearch for uncategorized data files, the further information includinginformation other than the URL, and using the first client computer tostore that information in the uncategorized database.